Dataset statistics
| Number of variables | 14 |
|---|---|
| Number of observations | 1039 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 113.8 KiB |
| Average record size in memory | 112.1 B |
Variable types
| Numeric | 13 |
|---|---|
| Categorical | 1 |
Number words female is highly correlated with Total words and 2 other fields | High correlation |
Total words is highly correlated with Number words female and 5 other fields | High correlation |
Number of words lead is highly correlated with Number words female and 3 other fields | High correlation |
Difference in words lead and co-lead is highly correlated with Total words and 3 other fields | High correlation |
Number of male actors is highly correlated with Total words and 2 other fields | High correlation |
Number of female actors is highly correlated with Number words female and 1 other fields | High correlation |
Number words male is highly correlated with Total words and 3 other fields | High correlation |
Mean Age Female is highly correlated with Mean Age Male and 1 other fields | High correlation |
Age Co-Lead is highly correlated with Mean Age Male and 1 other fields | High correlation |
Mean Age Male is highly correlated with Mean Age Female and 2 other fields | High correlation |
Age Lead is highly correlated with Mean Age Male | High correlation |
Number words female has 21 (2.0%) zeros | Zeros |
Reproduction
| Analysis started | 2022-11-18 22:46:56.655985 |
|---|---|
| Analysis finished | 2022-11-18 22:47:38.268438 |
| Duration | 41.61 seconds |
| Software version | pandas-profiling v3.4.0 |
| Download configuration | config.json |
| Distinct | 895 |
|---|---|
| Distinct (%) | 86.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2334.256015 |
| Minimum | 0 |
|---|---|
| Maximum | 17658 |
| Zeros | 21 |
| Zeros (%) | 2.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 237.7 |
| Q1 | 904 |
| median | 1711 |
| Q3 | 3030.5 |
| 95-th percentile | 6930.5 |
| Maximum | 17658 |
| Range | 17658 |
| Interquartile range (IQR) | 2126.5 |
Descriptive statistics
| Standard deviation | 2157.216744 |
|---|---|
| Coefficient of variation (CV) | 0.9241560179 |
| Kurtosis | 6.342849518 |
| Mean | 2334.256015 |
| Median Absolute Deviation (MAD) | 967 |
| Skewness | 2.123789262 |
| Sum | 2425292 |
| Variance | 4653584.081 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 21 | 2.0% |
| 1688 | 4 | 0.4% |
| 1858 | 3 | 0.3% |
| 1154 | 3 | 0.3% |
| 864 | 3 | 0.3% |
| 2094 | 3 | 0.3% |
| 1138 | 3 | 0.3% |
| 832 | 3 | 0.3% |
| 972 | 3 | 0.3% |
| 1120 | 3 | 0.3% |
| Other values (885) | 990 |
| Value | Count | Frequency (%) |
| 0 | 21 | |
| 102 | 1 | 0.1% |
| 103 | 1 | 0.1% |
| 104 | 1 | 0.1% |
| 105 | 1 | 0.1% |
| 110 | 1 | 0.1% |
| 111 | 1 | 0.1% |
| 121 | 1 | 0.1% |
| 122 | 1 | 0.1% |
| 124 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 17658 | 1 | |
| 13530 | 1 | |
| 13054 | 1 | |
| 12596 | 1 | |
| 12226 | 1 | |
| 12108 | 1 | |
| 12002 | 1 | |
| 11408 | 1 | |
| 10688 | 1 | |
| 10582 | 1 |
| Distinct | 1008 |
|---|---|
| Distinct (%) | 97.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 11004.36862 |
| Minimum | 1351 |
|---|---|
| Maximum | 67548 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.2 KiB |
Quantile statistics
| Minimum | 1351 |
|---|---|
| 5-th percentile | 3541.4 |
| Q1 | 6353.5 |
| median | 9147 |
| Q3 | 13966.5 |
| 95-th percentile | 23771.6 |
| Maximum | 67548 |
| Range | 66197 |
| Interquartile range (IQR) | 7613 |
Descriptive statistics
| Standard deviation | 6817.397413 |
|---|---|
| Coefficient of variation (CV) | 0.6195173613 |
| Kurtosis | 7.996606653 |
| Mean | 11004.36862 |
| Median Absolute Deviation (MAD) | 3402 |
| Skewness | 1.993266526 |
| Sum | 11433539 |
| Variance | 46476907.48 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 7074 | 3 | 0.3% |
| 8611 | 3 | 0.3% |
| 8174 | 2 | 0.2% |
| 13956 | 2 | 0.2% |
| 18864 | 2 | 0.2% |
| 2750 | 2 | 0.2% |
| 12286 | 2 | 0.2% |
| 13486 | 2 | 0.2% |
| 5713 | 2 | 0.2% |
| 6294 | 2 | 0.2% |
| Other values (998) | 1017 |
| Value | Count | Frequency (%) |
| 1351 | 1 | |
| 1368 | 1 | |
| 1371 | 1 | |
| 1468 | 1 | |
| 1522 | 1 | |
| 1658 | 1 | |
| 1672 | 1 | |
| 1726 | 1 | |
| 1870 | 1 | |
| 1954 | 1 |
| Value | Count | Frequency (%) |
| 67548 | 1 | |
| 57524 | 1 | |
| 43768 | 1 | |
| 41260 | 1 | |
| 40928 | 1 | |
| 39038 | 1 | |
| 37778 | 1 | |
| 33636 | 1 | |
| 33402 | 1 | |
| 32538 | 1 |
| Distinct | 964 |
|---|---|
| Distinct (%) | 92.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4108.256978 |
| Minimum | 318 |
|---|---|
| Maximum | 28102 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.2 KiB |
Quantile statistics
| Minimum | 318 |
|---|---|
| 5-th percentile | 1057.3 |
| Q1 | 2077 |
| median | 3297 |
| Q3 | 5227 |
| 95-th percentile | 9738.8 |
| Maximum | 28102 |
| Range | 27784 |
| Interquartile range (IQR) | 3150 |
Descriptive statistics
| Standard deviation | 2981.251156 |
|---|---|
| Coefficient of variation (CV) | 0.7256729976 |
| Kurtosis | 10.26313284 |
| Mean | 4108.256978 |
| Median Absolute Deviation (MAD) | 1463 |
| Skewness | 2.301127511 |
| Sum | 4268479 |
| Variance | 8887858.455 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2498 | 3 | 0.3% |
| 3976 | 3 | 0.3% |
| 789 | 3 | 0.3% |
| 1894 | 2 | 0.2% |
| 4356 | 2 | 0.2% |
| 2807 | 2 | 0.2% |
| 1343 | 2 | 0.2% |
| 2135 | 2 | 0.2% |
| 2891 | 2 | 0.2% |
| 1875 | 2 | 0.2% |
| Other values (954) | 1016 |
| Value | Count | Frequency (%) |
| 318 | 1 | |
| 472 | 1 | |
| 501 | 1 | |
| 506 | 1 | |
| 529 | 1 | |
| 551 | 1 | |
| 573 | 1 | |
| 589 | 1 | |
| 611 | 1 | |
| 641 | 1 |
| Value | Count | Frequency (%) |
| 28102 | 1 | |
| 26798 | 1 | |
| 24376 | 1 | |
| 19892 | 1 | |
| 16148 | 1 | |
| 15708 | 1 | |
| 14794 | 1 | |
| 14642 | 1 | |
| 14490 | 1 | |
| 14214 | 1 |
| Distinct | 951 |
|---|---|
| Distinct (%) | 91.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2525.024062 |
| Minimum | 1 |
|---|---|
| Maximum | 25822 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.2 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 165.8 |
| Q1 | 814.5 |
| median | 1834 |
| Q3 | 3364 |
| 95-th percentile | 7654.4 |
| Maximum | 25822 |
| Range | 25821 |
| Interquartile range (IQR) | 2549.5 |
Descriptive statistics
| Standard deviation | 2498.747279 |
|---|---|
| Coefficient of variation (CV) | 0.9895934527 |
| Kurtosis | 12.76299788 |
| Mean | 2525.024062 |
| Median Absolute Deviation (MAD) | 1184 |
| Skewness | 2.620408806 |
| Sum | 2623500 |
| Variance | 6243737.966 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 694 | 4 | 0.4% |
| 622 | 4 | 0.4% |
| 3200 | 3 | 0.3% |
| 597 | 3 | 0.3% |
| 352 | 3 | 0.3% |
| 519 | 3 | 0.3% |
| 503 | 3 | 0.3% |
| 3476 | 3 | 0.3% |
| 792 | 3 | 0.3% |
| 1540 | 2 | 0.2% |
| Other values (941) | 1008 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 4 | 1 | |
| 11 | 1 | |
| 13 | 1 | |
| 14 | 1 | |
| 15 | 1 | |
| 16 | 1 | |
| 17 | 1 | |
| 23 | 1 | |
| 24 | 1 |
| Value | Count | Frequency (%) |
| 25822 | 1 | |
| 19138 | 1 | |
| 18182 | 1 | |
| 17822 | 1 | |
| 13692 | 1 | |
| 12921 | 1 | |
| 12060 | 1 | |
| 11436 | 1 | |
| 11274 | 1 | |
| 11058 | 1 |
| Distinct | 27 |
|---|---|
| Distinct (%) | 2.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7.767083734 |
| Minimum | 1 |
|---|---|
| Maximum | 29 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.2 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 5 |
| median | 7 |
| Q3 | 10 |
| 95-th percentile | 15 |
| Maximum | 29 |
| Range | 28 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 3.901439172 |
|---|---|
| Coefficient of variation (CV) | 0.5023042503 |
| Kurtosis | 3.006479544 |
| Mean | 7.767083734 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 1.222559422 |
| Sum | 8070 |
| Variance | 15.22122761 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=27)
| Value | Count | Frequency (%) |
| 6 | 120 | |
| 5 | 115 | |
| 7 | 111 | |
| 4 | 111 | |
| 8 | 109 | |
| 9 | 88 | |
| 10 | 78 | |
| 11 | 57 | 5.5% |
| 3 | 52 | 5.0% |
| 12 | 43 | 4.1% |
| Other values (17) | 155 |
| Value | Count | Frequency (%) |
| 1 | 14 | 1.3% |
| 2 | 30 | 2.9% |
| 3 | 52 | |
| 4 | 111 | |
| 5 | 115 | |
| 6 | 120 | |
| 7 | 111 | |
| 8 | 109 | |
| 9 | 88 | |
| 10 | 78 |
| Value | Count | Frequency (%) |
| 29 | 1 | 0.1% |
| 28 | 1 | 0.1% |
| 27 | 1 | 0.1% |
| 26 | 2 | 0.2% |
| 23 | 1 | 0.1% |
| 22 | 2 | 0.2% |
| 21 | 3 | |
| 20 | 2 | 0.2% |
| 19 | 1 | 0.1% |
| 18 | 6 |
Year
Real number (ℝ≥0)
| Distinct | 51 |
|---|---|
| Distinct (%) | 4.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1999.862368 |
| Minimum | 1939 |
|---|---|
| Maximum | 2015 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.2 KiB |
Quantile statistics
| Minimum | 1939 |
|---|---|
| 5-th percentile | 1982 |
| Q1 | 1994 |
| median | 2000 |
| Q3 | 2009 |
| 95-th percentile | 2013 |
| Maximum | 2015 |
| Range | 76 |
| Interquartile range (IQR) | 15 |
Descriptive statistics
| Standard deviation | 10.40663225 |
|---|---|
| Coefficient of variation (CV) | 0.005203674222 |
| Kurtosis | 3.537650153 |
| Mean | 1999.862368 |
| Median Absolute Deviation (MAD) | 7 |
| Skewness | -1.267261948 |
| Sum | 2077857 |
| Variance | 108.2979948 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2009 | 62 | 6.0% |
| 2010 | 58 | 5.6% |
| 1999 | 55 | 5.3% |
| 1997 | 47 | 4.5% |
| 2000 | 45 | 4.3% |
| 2011 | 44 | 4.2% |
| 1998 | 41 | 3.9% |
| 2002 | 37 | 3.6% |
| 2001 | 37 | 3.6% |
| 2008 | 36 | 3.5% |
| Other values (41) | 577 |
| Value | Count | Frequency (%) |
| 1939 | 2 | |
| 1949 | 1 | 0.1% |
| 1954 | 2 | |
| 1958 | 2 | |
| 1959 | 2 | |
| 1960 | 1 | 0.1% |
| 1968 | 2 | |
| 1972 | 3 | |
| 1973 | 4 | |
| 1974 | 2 |
| Value | Count | Frequency (%) |
| 2015 | 19 | 1.8% |
| 2014 | 24 | 2.3% |
| 2013 | 31 | |
| 2012 | 27 | |
| 2011 | 44 | |
| 2010 | 58 | |
| 2009 | 62 | |
| 2008 | 36 | |
| 2007 | 32 | |
| 2006 | 22 | 2.1% |
| Distinct | 13 |
|---|---|
| Distinct (%) | 1.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.507218479 |
| Minimum | 1 |
|---|---|
| Maximum | 16 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.2 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 3 |
| Q3 | 5 |
| 95-th percentile | 8 |
| Maximum | 16 |
| Range | 15 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 2.08852629 |
|---|---|
| Coefficient of variation (CV) | 0.5954936375 |
| Kurtosis | 2.238918326 |
| Mean | 3.507218479 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 1.221849332 |
| Sum | 3644 |
| Variance | 4.361942063 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=13)
| Value | Count | Frequency (%) |
| 2 | 227 | |
| 3 | 208 | |
| 4 | 184 | |
| 1 | 159 | |
| 5 | 98 | |
| 6 | 71 | 6.8% |
| 7 | 39 | 3.8% |
| 8 | 26 | 2.5% |
| 9 | 12 | 1.2% |
| 10 | 7 | 0.7% |
| Other values (3) | 8 | 0.8% |
| Value | Count | Frequency (%) |
| 1 | 159 | |
| 2 | 227 | |
| 3 | 208 | |
| 4 | 184 | |
| 5 | 98 | |
| 6 | 71 | 6.8% |
| 7 | 39 | 3.8% |
| 8 | 26 | 2.5% |
| 9 | 12 | 1.2% |
| 10 | 7 | 0.7% |
| Value | Count | Frequency (%) |
| 16 | 1 | 0.1% |
| 12 | 3 | 0.3% |
| 11 | 4 | 0.4% |
| 10 | 7 | 0.7% |
| 9 | 12 | 1.2% |
| 8 | 26 | 2.5% |
| 7 | 39 | 3.8% |
| 6 | 71 | 6.8% |
| 5 | 98 | |
| 4 | 184 |
| Distinct | 960 |
|---|---|
| Distinct (%) | 92.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4561.85563 |
| Minimum | 0 |
|---|---|
| Maximum | 31146 |
| Zeros | 5 |
| Zeros (%) | 0.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 867.9 |
| Q1 | 2139.5 |
| median | 3824 |
| Q3 | 5887.5 |
| 95-th percentile | 10755.2 |
| Maximum | 31146 |
| Range | 31146 |
| Interquartile range (IQR) | 3748 |
Descriptive statistics
| Standard deviation | 3417.855987 |
|---|---|
| Coefficient of variation (CV) | 0.7492249348 |
| Kurtosis | 7.549387365 |
| Mean | 4561.85563 |
| Median Absolute Deviation (MAD) | 1840 |
| Skewness | 2.026475006 |
| Sum | 4739768 |
| Variance | 11681739.55 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 5 | 0.5% |
| 3468 | 3 | 0.3% |
| 3114 | 3 | 0.3% |
| 1858 | 3 | 0.3% |
| 3299 | 3 | 0.3% |
| 1171 | 2 | 0.2% |
| 5836 | 2 | 0.2% |
| 4552 | 2 | 0.2% |
| 156 | 2 | 0.2% |
| 2240 | 2 | 0.2% |
| Other values (950) | 1012 |
| Value | Count | Frequency (%) |
| 0 | 5 | |
| 113 | 1 | 0.1% |
| 114 | 1 | 0.1% |
| 130 | 1 | 0.1% |
| 156 | 2 | 0.2% |
| 186 | 1 | 0.1% |
| 204 | 1 | 0.1% |
| 225 | 1 | 0.1% |
| 232 | 1 | 0.1% |
| 242 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 31146 | 1 | |
| 25628 | 1 | |
| 22650 | 1 | |
| 22422 | 1 | |
| 20464 | 1 | |
| 20044 | 1 | |
| 18496 | 1 | |
| 17792 | 1 | |
| 17128 | 1 | |
| 16956 | 1 |
Gross
Real number (ℝ≥0)
| Distinct | 317 |
|---|---|
| Distinct (%) | 30.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 111.1491819 |
| Minimum | 0 |
|---|---|
| Maximum | 1798 |
| Zeros | 10 |
| Zeros (%) | 1.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 22 |
| median | 60 |
| Q3 | 143.5 |
| 95-th percentile | 374.2 |
| Maximum | 1798 |
| Range | 1798 |
| Interquartile range (IQR) | 121.5 |
Descriptive statistics
| Standard deviation | 151.7615507 |
|---|---|
| Coefficient of variation (CV) | 1.365386125 |
| Kurtosis | 24.05202261 |
| Mean | 111.1491819 |
| Median Absolute Deviation (MAD) | 48 |
| Skewness | 3.786103854 |
| Sum | 115484 |
| Variance | 23031.56828 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 27 | 2.6% |
| 2 | 24 | 2.3% |
| 8 | 18 | 1.7% |
| 7 | 17 | 1.6% |
| 11 | 14 | 1.3% |
| 34 | 14 | 1.3% |
| 32 | 13 | 1.3% |
| 6 | 13 | 1.3% |
| 5 | 12 | 1.2% |
| 4 | 12 | 1.2% |
| Other values (307) | 875 |
| Value | Count | Frequency (%) |
| 0 | 10 | 1.0% |
| 1 | 27 | |
| 2 | 24 | |
| 3 | 9 | 0.9% |
| 4 | 12 | |
| 5 | 12 | |
| 6 | 13 | |
| 7 | 17 | |
| 8 | 18 | |
| 9 | 6 | 0.6% |
| Value | Count | Frequency (%) |
| 1798 | 1 | |
| 1249 | 1 | |
| 1103 | 1 | |
| 937 | 1 | |
| 882 | 1 | |
| 880 | 2 | |
| 853 | 1 | |
| 844 | 1 | |
| 839 | 1 | |
| 813 | 1 |
| Distinct | 542 |
|---|---|
| Distinct (%) | 52.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 42.35376582 |
| Minimum | 19 |
|---|---|
| Maximum | 71 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.2 KiB |
Quantile statistics
| Minimum | 19 |
|---|---|
| 5-th percentile | 28.8 |
| Q1 | 37.48076923 |
| median | 42.6 |
| Q3 | 47.33333333 |
| 95-th percentile | 54.72857143 |
| Maximum | 71 |
| Range | 52 |
| Interquartile range (IQR) | 9.852564103 |
Descriptive statistics
| Standard deviation | 7.81710979 |
|---|---|
| Coefficient of variation (CV) | 0.1845670541 |
| Kurtosis | 0.2906314251 |
| Mean | 42.35376582 |
| Median Absolute Deviation (MAD) | 4.971428571 |
| Skewness | 0.05257910605 |
| Sum | 44005.56269 |
| Variance | 61.10720546 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 42 | 20 | 1.9% |
| 43 | 14 | 1.3% |
| 40 | 12 | 1.2% |
| 41.5 | 11 | 1.1% |
| 46 | 10 | 1.0% |
| 48 | 10 | 1.0% |
| 38 | 9 | 0.9% |
| 37 | 8 | 0.8% |
| 36 | 8 | 0.8% |
| 45.66666667 | 8 | 0.8% |
| Other values (532) | 929 |
| Value | Count | Frequency (%) |
| 19 | 1 | |
| 20.66666667 | 1 | |
| 21 | 1 | |
| 22 | 1 | |
| 23 | 1 | |
| 23.5 | 1 | |
| 23.75 | 1 | |
| 24 | 2 | |
| 24.58333333 | 1 | |
| 24.6 | 2 |
| Value | Count | Frequency (%) |
| 71 | 1 | |
| 70 | 1 | |
| 69 | 1 | |
| 67.5 | 1 | |
| 66.33333333 | 1 | |
| 66 | 1 | |
| 63 | 1 | |
| 62.33333333 | 1 | |
| 62.16666667 | 1 | |
| 62 | 1 |
| Distinct | 274 |
|---|---|
| Distinct (%) | 26.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 35.92958774 |
| Minimum | 11 |
|---|---|
| Maximum | 81.33333333 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.2 KiB |
Quantile statistics
| Minimum | 11 |
|---|---|
| 5-th percentile | 23.33333333 |
| Q1 | 29.5 |
| median | 35 |
| Q3 | 41.5 |
| 95-th percentile | 51.35 |
| Maximum | 81.33333333 |
| Range | 70.33333333 |
| Interquartile range (IQR) | 12 |
Descriptive statistics
| Standard deviation | 8.957192996 |
|---|---|
| Coefficient of variation (CV) | 0.2492985185 |
| Kurtosis | 1.306353151 |
| Mean | 35.92958774 |
| Median Absolute Deviation (MAD) | 6 |
| Skewness | 0.7384501139 |
| Sum | 37330.84167 |
| Variance | 80.23130636 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 37 | 31 | 3.0% |
| 35 | 28 | 2.7% |
| 30 | 25 | 2.4% |
| 29 | 25 | 2.4% |
| 28 | 25 | 2.4% |
| 34 | 24 | 2.3% |
| 33 | 22 | 2.1% |
| 32 | 21 | 2.0% |
| 31 | 21 | 2.0% |
| 27 | 21 | 2.0% |
| Other values (264) | 796 |
| Value | Count | Frequency (%) |
| 11 | 1 | |
| 13 | 2 | |
| 16 | 2 | |
| 17 | 2 | |
| 18 | 2 | |
| 18.66666667 | 1 | |
| 19 | 2 | |
| 19.25 | 1 | |
| 19.66666667 | 2 | |
| 20 | 2 |
| Value | Count | Frequency (%) |
| 81.33333333 | 1 | 0.1% |
| 71 | 1 | 0.1% |
| 70.5 | 1 | 0.1% |
| 69 | 1 | 0.1% |
| 67.5 | 1 | 0.1% |
| 66.33333333 | 1 | 0.1% |
| 65 | 2 | |
| 63 | 3 | |
| 62.66666667 | 1 | 0.1% |
| 62.5 | 1 | 0.1% |
| Distinct | 68 |
|---|---|
| Distinct (%) | 6.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 38.71607315 |
| Minimum | 11 |
|---|---|
| Maximum | 81 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.2 KiB |
Quantile statistics
| Minimum | 11 |
|---|---|
| 5-th percentile | 20 |
| Q1 | 30 |
| median | 38 |
| Q3 | 46 |
| 95-th percentile | 62 |
| Maximum | 81 |
| Range | 70 |
| Interquartile range (IQR) | 16 |
Descriptive statistics
| Standard deviation | 12.28590219 |
|---|---|
| Coefficient of variation (CV) | 0.3173333758 |
| Kurtosis | 0.2450172233 |
| Mean | 38.71607315 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | 0.5248594002 |
| Sum | 40226 |
| Variance | 150.9433927 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 32 | 42 | 4.0% |
| 36 | 40 | 3.8% |
| 29 | 39 | 3.8% |
| 33 | 39 | 3.8% |
| 38 | 39 | 3.8% |
| 43 | 38 | 3.7% |
| 34 | 38 | 3.7% |
| 41 | 36 | 3.5% |
| 42 | 36 | 3.5% |
| 30 | 34 | 3.3% |
| Other values (58) | 658 |
| Value | Count | Frequency (%) |
| 11 | 3 | 0.3% |
| 12 | 3 | 0.3% |
| 13 | 1 | 0.1% |
| 14 | 4 | 0.4% |
| 15 | 2 | 0.2% |
| 16 | 4 | 0.4% |
| 17 | 7 | |
| 18 | 10 | |
| 19 | 8 | |
| 20 | 12 |
| Value | Count | Frequency (%) |
| 81 | 1 | 0.1% |
| 80 | 1 | 0.1% |
| 78 | 2 | |
| 77 | 1 | 0.1% |
| 76 | 3 | |
| 75 | 1 | 0.1% |
| 74 | 1 | 0.1% |
| 72 | 2 | |
| 71 | 1 | 0.1% |
| 70 | 4 |
| Distinct | 73 |
|---|---|
| Distinct (%) | 7.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 35.48604427 |
| Minimum | 7 |
|---|---|
| Maximum | 85 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.2 KiB |
Quantile statistics
| Minimum | 7 |
|---|---|
| 5-th percentile | 20 |
| Q1 | 28 |
| median | 34 |
| Q3 | 41 |
| 95-th percentile | 60 |
| Maximum | 85 |
| Range | 78 |
| Interquartile range (IQR) | 13 |
Descriptive statistics
| Standard deviation | 12.04669574 |
|---|---|
| Coefficient of variation (CV) | 0.3394769969 |
| Kurtosis | 1.497207445 |
| Mean | 35.48604427 |
| Median Absolute Deviation (MAD) | 7 |
| Skewness | 0.9967891749 |
| Sum | 36870 |
| Variance | 145.1228783 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 34 | 56 | 5.4% |
| 28 | 54 | 5.2% |
| 31 | 48 | 4.6% |
| 32 | 42 | 4.0% |
| 27 | 41 | 3.9% |
| 29 | 40 | 3.8% |
| 35 | 40 | 3.8% |
| 37 | 39 | 3.8% |
| 30 | 37 | 3.6% |
| 39 | 36 | 3.5% |
| Other values (63) | 606 |
| Value | Count | Frequency (%) |
| 7 | 1 | 0.1% |
| 8 | 1 | 0.1% |
| 9 | 2 | |
| 10 | 3 | |
| 11 | 2 | |
| 12 | 4 | |
| 13 | 2 | |
| 14 | 4 | |
| 15 | 2 | |
| 16 | 3 |
| Value | Count | Frequency (%) |
| 85 | 1 | 0.1% |
| 84 | 1 | 0.1% |
| 80 | 1 | 0.1% |
| 79 | 2 | |
| 77 | 2 | |
| 75 | 1 | 0.1% |
| 74 | 4 | |
| 72 | 1 | 0.1% |
| 71 | 2 | |
| 70 | 2 |
Lead
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 8.2 KiB |
| Male | |
|---|---|
| Female |
Length
| Max length | 6 |
|---|---|
| Median length | 4 |
| Mean length | 4.488931665 |
| Min length | 4 |
Characters and Unicode
| Total characters | 4664 |
|---|---|
| Distinct characters | 6 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Female |
|---|---|
| 2nd row | Male |
| 3rd row | Male |
| 4th row | Male |
| 5th row | Male |
Common Values
| Value | Count | Frequency (%) |
| Male | 785 | |
| Female | 254 | 24.4% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| male | 785 | |
| female | 254 | 24.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 1293 | |
| a | 1039 | |
| l | 1039 | |
| M | 785 | |
| F | 254 | 5.4% |
| m | 254 | 5.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 3625 | |
| Uppercase Letter | 1039 | 22.3% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 1293 | |
| a | 1039 | |
| l | 1039 | |
| m | 254 | 7.0% |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 785 | |
| F | 254 | 24.4% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 4664 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 1293 | |
| a | 1039 | |
| l | 1039 | |
| M | 785 | |
| F | 254 | 5.4% |
| m | 254 | 5.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 4664 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 1293 | |
| a | 1039 | |
| l | 1039 | |
| M | 785 | |
| F | 254 | 5.4% |
| m | 254 | 5.4% |
Auto
The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| Number words female | Total words | Number of words lead | Difference in words lead and co-lead | Number of male actors | Year | Number of female actors | Number words male | Gross | Mean Age Male | Mean Age Female | Age Lead | Age Co-Lead | Lead | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1512 | 6394 | 2251.0 | 343 | 2 | 1995 | 5 | 2631 | 142.0 | 51.500000 | 42.333333 | 46.0 | 65.0 | Female |
| 1 | 1524 | 8780 | 2020.0 | 1219 | 9 | 2001 | 4 | 5236 | 37.0 | 39.125000 | 29.333333 | 58.0 | 34.0 | Male |
| 2 | 155 | 4176 | 942.0 | 787 | 7 | 1968 | 1 | 3079 | 376.0 | 42.500000 | 37.000000 | 46.0 | 37.0 | Male |
| 3 | 1073 | 9855 | 3440.0 | 2623 | 12 | 2002 | 2 | 5342 | 19.0 | 35.222222 | 21.500000 | 33.0 | 23.0 | Male |
| 4 | 1317 | 7688 | 3835.0 | 3149 | 8 | 1988 | 4 | 2536 | 40.0 | 45.250000 | 45.000000 | 36.0 | 39.0 | Male |
| 5 | 1492 | 5872 | 1491.0 | 994 | 11 | 1997 | 4 | 2889 | 327.0 | 45.909091 | 36.500000 | 55.0 | 41.0 | Male |
| 6 | 1500 | 5322 | 1191.0 | 287 | 6 | 1980 | 3 | 2631 | 269.0 | 47.000000 | 24.500000 | 61.0 | 25.0 | Male |
| 7 | 349 | 6098 | 2692.0 | 2472 | 9 | 1988 | 2 | 3057 | 53.0 | 43.000000 | 31.000000 | 48.0 | 31.0 | Male |
| 8 | 857 | 8851 | 4042.0 | 3476 | 13 | 2001 | 2 | 3952 | 89.0 | 47.416667 | 28.500000 | 33.0 | 27.0 | Male |
| 9 | 2619 | 9626 | 1604.0 | 869 | 9 | 1973 | 6 | 5403 | 565.0 | 26.500000 | 22.000000 | 20.0 | 26.0 | Male |
Last rows
| Number words female | Total words | Number of words lead | Difference in words lead and co-lead | Number of male actors | Year | Number of female actors | Number words male | Gross | Mean Age Male | Mean Age Female | Age Lead | Age Co-Lead | Lead | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1029 | 999 | 3951 | 1146.0 | 147 | 5 | 2010 | 1 | 1806 | 74.0 | 49.000000 | 35.000000 | 47.0 | 35.0 | Male |
| 1030 | 1748 | 8036 | 2065.0 | 656 | 10 | 2013 | 2 | 4223 | 78.0 | 50.333333 | 25.000000 | 33.0 | 36.0 | Male |
| 1031 | 2276 | 9351 | 3422.0 | 2209 | 9 | 2014 | 3 | 3653 | 200.0 | 49.888889 | 34.333333 | 45.0 | 32.0 | Male |
| 1032 | 899 | 1986 | 974.0 | 288 | 2 | 1995 | 2 | 113 | 214.0 | 61.500000 | 46.000000 | 42.0 | 61.0 | Male |
| 1033 | 127 | 1726 | 1232.0 | 1105 | 3 | 1981 | 1 | 367 | 194.0 | 48.666667 | 33.000000 | 54.0 | 33.0 | Male |
| 1034 | 303 | 2398 | 1334.0 | 1166 | 5 | 1973 | 2 | 761 | 174.0 | 43.200000 | 31.000000 | 46.0 | 24.0 | Male |
| 1035 | 632 | 8404 | 1952.0 | 187 | 6 | 1992 | 2 | 5820 | 172.0 | 37.166667 | 24.000000 | 21.0 | 34.0 | Female |
| 1036 | 1326 | 2750 | 877.0 | 356 | 2 | 2000 | 3 | 547 | 53.0 | 27.500000 | 27.666667 | 28.0 | 25.0 | Male |
| 1037 | 462 | 3994 | 775.0 | 52 | 8 | 1996 | 3 | 2757 | 32.0 | 42.857143 | 38.500000 | 29.0 | 32.0 | Female |
| 1038 | 2735 | 11946 | 3410.0 | 1536 | 13 | 2007 | 4 | 5801 | 32.0 | 44.090909 | 50.000000 | 38.0 | 48.0 | Male |